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ABSTRACT 

Observed galaxy clustering exhibits local transverse statistical isotropy around the line- 
of-sight (LOS). The variation of the LOS across a galaxy survey complicates the measure¬ 
ment of the observed clustering as a function of the angle to the LOS, as fast Fourier trans¬ 
forms (FFTs) based on Cartesian grids, cannot individually allow for this. Recent advances 
in methodology for calculating LOS-dependent clustering in Fourier space include the re¬ 
alization that power spectrum LOS-dependent moments can be constructed from sums over 
galaxies, based on approximating the LOS to each pair of galaxies by the LOS to one of 
them. We show that we can implement this method using multiple FFTs, each measuring the 
LOS-weighted clustering along different axes. The N log N nature of FFTs means that the 
computational speed-up is a factor of > 1000 compared with summing over galaxies. This 
development should be beneficial for future projects such as DESI and Euclid which will 
provide an order of magnitude more galaxies than current surveys. 
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1 INTRODUCTION 

Although the Universe is predicted to be statistically homoge¬ 
neous and isotropic, observational effects including the Alcock- 
Paczynsky effect (AP; Alcock & Paczynski 1979) and redshift- 
space distortions (RSD; Kaiser|1987 1 mean that the observed clus¬ 
tering, when translated into comoving coordinates using a fidu¬ 
cial distance-redshift relation exhibits local transverse statistical 
isotropy around the line-of-sight (LOS). The key measurement to 
be made from a galaxy survey is consequently the clustering as a 
function of the angle to the LOS. If we consider the clustering in 
configuration-space, then the base ‘unit" is a pair of galaxies, and it 
is common to treat a pair as having a single LOS, usually defined 
as the direction to the pair centre. Any effects because the galaxies 
within the pair have different LOSs are called ‘wide-angle effect’ 


( Szalay, Matsubara & Landy 

1998 

Szapudi 2004) and are small 

of the scales of interest (Beut 

er et a 

. 2012; Samushia, Percival & 

Raccanelli 2012, Yoo & Selja 

k 2015 

. Thus in configuration space, 


measuring clustering with respect to the LOS can be easily incor¬ 
porated into pair-counting algorithms ( |Landy & Szalay|1993| ) with 
a different LOS for each pair. 

In Fourier-space, dealing with the varying LOS is more diffi¬ 
cult, as fast Fourier methods do not, in general, allow for the varia¬ 
tion of LOS. One option is to use a basis built up from spherical 
harmonics and Bessel functions, which naturally separates clus- 


1994; 

Heavens & Taylor|l995 

(2006 

and Blake et al. (2011 


considered the Fourier decompo¬ 
sition as a sum over pairs of galaxies, and showed that this can 
be simplified (and speeded up) by assuming that the LOS to the 
pair is equivalent to the LOS to a single galaxy (the method is de¬ 
scribed in §[2]i. This approximately doubles the ‘wide-angle effect’ 
( jSamushia, Branchini & Percival|20T5| , but that is small anyway. 
In this Letter we consider how to implement the transform with 
this approximation, showing that we can use multiple fast Fourier 
transforms (FFTs) to perform this sum for power-law moments in 
ft, = k • Flos, the cosine of the angle to the LOS (this is described 
in § |3j. In § [4] we present the results of tests of three implemen¬ 
tations of the method, summing over galaxies, grid cells or using 
FFTs. We show that they provide consistent results, and compare 
the computational burden of each. By decomposing any moment 
into a sum over Legendre polynomials, we can construct any power 
spectrum moment using this method (§|5j. Such developments are 
necessary as one often wants to measure the power spectrum mo¬ 
ments, not only in the data, but also in a large numbers of mock cat¬ 
alogues used to estimate and test for errors: for example, | Anderson] 
|et al.| ( [20l4|> analysed the Ba ryon Oscillation Spectroscopic Survey 
(BOSS; Dawson et al.|20 1 3) data and 1000 mock catalogues. Thus 
the computational burden of measuring LOS-dependent clustering 
is high. 
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2 METHOD 


We start by defining the function (jFeldman, Kaiser & Peacock| 

[T994| , 


F(r) 


w(r) 
1 1/2 


[n(r) 




0) 


where n and n„ are, respectively, the observed number density 
of galaxies and the number density of a synthetic catalog of ran¬ 
doms, Poisson sampled with the same mask and selection function 
as the survey with no other cosmological correlations, and w is the 
weight, a normalizes the weighted random catalogue to match the 
weighted galaxy catalogue. The factor I normalizes the amplitude 
of the observed power in accordance with its definition in a uni¬ 
verse with no survey selection, I = f drw 2 n 2 (r). From Eq. |TJ 
we can define the multipole power spectrum estimator as l |Feldman7| 
|Kaiser & Peacock|1994| | Yamam oto et al.|2006} . 


Aw = 2 ^ / ^ 


/*■ / 


dr 2 F(r 1 )F(r 2 ) 


x e ik ' (ri - r2, £f(k ■ r„) - Pr ise (k)] , (2) 


where = (ri -\-r 2 )/2 denotes the LOS of the pair of galaxies ri 
and r 2 , dPlk is the solid angle element in fc-space, Ci is the l— th 
order Legendre polynomial and P“ olse i s the shot noise term given 

by 


in Beutler et al. 2014). In this Letter, we show that the efficiency 
of this estimator can be further improved by making use of FFT 
algorithms, such as FFTwQ 


3 FFT IMPLEMENTATION 

Here we show how to write the Yamamoto algorithm in terms 
of Nk log(iV) processes for any multipoles. For simplicity and 
with no loss of generality, we focus on the monopole (which, 
as discussed in ^2] reduces to the standard FKP description), the 
quadrupole and the hexadecapole. We proceed by defining the con¬ 
venient function, 


A n ( k) = 

= J dr (k ■ r ) F( r)e lk r . 

(5) 

With this, Eq. (4} reads. 



P 0 Yama ( fc ) =* i J 

^ [A 0 (k)A* 0 (k)]-pr se 

(6) 

P 2 Yama (fc) = A j 

fd ^A 0 (k) [3AKk)-AS(k)] ; 

(7) 

PJ^ik) = ^j 

fd ^A 0 (k) [35Al(k)-30A2(k) 


+ 3A 0 (k)]. 

(8) 


P“ olsc (k) = (1 + a) J drn{r)w 2 {r)C l {k ■ f) . (3) 

For multipoles of order £ > 0, P™ se -C Pe, and consequently the 
shot noise correction is negligible. 

Denoting the number of fc-modes that we want to evaluate by 
Nk and the number of elements that we use to perform the integral 
over ri or r 2 by N, we see that the computation of Eq. 0 will be 
of order Nk x N 2 , as the integrals in ri and r 2 are not separable. 
In effect this approach performs a pair-wise clustering analysis and 
translates into Fourier-space. As N increases the total time needed 
to evaluate Eq. § grows dramatically. 

The FKP-estimator ( jFeldman, Kaiser & Peacock! 1994} uses 
the fact that the monopole is independent of the LOS, so the in¬ 
tegrals are separable and FFTs are trivial to apply. Consequently, 
the Nk x N 2 process becomes a Nk log(N) one, which it is easier 
to handle: here N is the number of grid cells at which we sample 
F, so for a FFT N = Nk - This estimator has been successfully ap¬ 
plied in many galaxy surveys to estimate the power spectrum and 
bispectrum monopoles (see e.g. |Gil-MarIn et al.|2015| and refer¬ 
ences therein). 

The Yamamoto estimator ( Yamamoto e t al.|2006(|Beutler et al.j 
|2014| keeps the relevant LOS information by approximating the 
LOS of each pair of galaxies with the LOS of one of the two galax¬ 
ies, Ce (k ■ r h) — £e {k ■ r 2 ), which yields 

= (21 + 1) £ dQ k 

I J Air 

x J dr 2 F(r 2 )e ~ lk ' T2 £e{k ■ r 2 ) - P“ olsc (k) . 

(4) 

This is a reliable approximation on the scale of interest, which 
clearly improves on assuming a single fixed LOS for the whole sur¬ 
vey for l > 0, but will eventually break down at very large scales 
(Samushia, Branchini & Percival 2 013] [Yoo & Seljak|2015fr . The 

integrals over ri and r 2 in Eq. (|4b are separable, so P e ama be¬ 
comes a Nk x N process if the integrals are solved using sums (as 


J d ri F(ri)e lk ri 


Note that the expressions for A 2 and A 4 include a fc-dependent 
term (kf)" in the integrand, which means that in this form Fourier 
transforms cannot directly be applied. This is the standard problem 
of dealing with a varying LOS across a survey. However, by means 
of the trivial decomposition 

C * k x r x + k y r y + k z r z 

k r =- “-, (9) 

kr 

A 2 can be easily re-written into a combination of smaller building 
blocks, 

A a (k) = ^ {k 2 x B xx ( k) + k 2 y B yy { k) + k 2 z B zz { k) 

—h 2 \k x kyB x y (k) -(- k x k z B xz (k£j -\- kyk z By Z (k^ } , 

( 10 ) 


where 

Bij(k) = J dr ^-F(r)e* k ' r . (11) 


Similarly, for A 4 we obtain. 


A 4 (k) 


^{k 4 x C xxx 

+ kyCyyy + 

ktc z 

ZZ 

+ 

4 hyCxxy + ^x^zCxxz H - 

kyk X Cyy X 

+ 

kykzCy-yz 

+ k z k x Czzx 

+ k 3 z 

kyCzzy ] 

+ 

6 \k x kyCxyy + k x k z c* 

'.ZZ + 

k 2 yk 2 z Cyzz\ 

+ 

12 k x ky kz 

\k x C X yz + ki 

j Cy X 2 

i + kzCzxy 


where 

Ctfi(k) = J dr ^F(r)e ,k r . (13) 

Ao, Bij and C',,; are all Nk log(JV) processes by the use of any 
FFT algorithm. This provides the value of monopole, quadrupole 
and hexadecapole with only 1, 7 (= 1 + 6) and 22 (= 1 + 6 + 15) 


1 Fastest Fourier Transform in the West: http://fftw.org 
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FFTs, respectively. Similar decompositions are possible for higher 
order multipoles. 

It is important to remark that, from an analytical point of view, 
the above decomposition is completely equivalent to Eq. j4|, i.e. it 
does not involve any further approximation. In essence, the symme¬ 
try encoded in the Yamamoto estimator of Eq. 0 is exactly cap¬ 
tured by including the variation of the LOS in the relative weighting 
of different galaxies to FFTs, each covering a different axis direc¬ 
tion, Eqs. < [ 1 0[ > and ( | 1 2| >. 


4 PERFORMANCE TESTS 

In this section we test the following three implementations of the 
Yamamoto estimator, solving Eq. 0 using the following. 

(i) A sum over galaxies and randoms (the total number of points 
is N) and the Nk fc-modes of interest. We will refer to this as sum- 
gal. 

(ii) A sum over a gridded representation of F with N grid cells, 
and the Nk fc-modes. We will only consider N = Nk although this 
is not fixed as for an FFT, and refer to this as sum-grid. 

(iii) An FFT-based implementation using a gridded representa¬ 
tion of F with N grid cells and the Nk = N fc-modes. We will 
refer to this as FFT-based. 


For the methods using sums we have optimised our code, min¬ 
imizing the computations performed within the inner most loops, 
and using the Hermitian symmetry in fc-space to reduce the number 
of fc-modes summed over. We also only compute power spectrum 
moments for k ^ 0.3ftMpc _1 for these methods. Additionally, for 
the sum-grid method we only include filled grid cells in the sum. 
We therefore consider that time taken by these algorithms is indica¬ 
tive of that achieved by most algorithms performing the transform 
using a sum. 

We will test these three options using the public mock galaxy 
catalogues matched to the CMASS galaxy sample of the Sloan Dig¬ 
ital Sky Survey (SDSS-III; lEisenstein et al.~|2 01 1), BOSS Data Re¬ 
lease 11 North Galactic Cap (Ma nera et al.|2013 1 . These catalogues 
each contain approximately 525,000 galaxies. We use the random 
catalogue provided with the galaxy mocks and we take the number 
density of the randoms to be 10 times higher than the number den¬ 
sity of the galaxies, i.e., a -1 ~ 10. For the implementations that 
use a grid, we place the galaxies and randoms in a cubic box of size 
Lj = 3500Mpc/i _1 using the Cloud-in-Cell (CiC) prescription, 
to obtain the quantity F( r) of Eq. 0. In order to correct for the 
effects of the grid left by the CiC scheme we have corrected appro¬ 
priately by the deconvolution window proposed in Jing ( |2005| l. 

Fig. □ displays the average power spectrum multipoles: 
monopole (red), quadrupole (blue) and hexadecapole (green) cal¬ 
culated from 50 mocks. The solid lines represent the FFT-based 
method, the dashed lines the sum-grid . and the dotted lines the sum- 
gal. The plot shows an almost exact agreement between the three 
implementations of Eq. 0. The results of the sum-grid algorithm 
show a few percent deviation at small scales. The origin of this is 
aliasing, which we have not corrected for. The aliasing effect for 
a 1024 3 grid is negligible for scales k ^ 0.4/iMpc -1 , and conse¬ 
quently does not appear for the FFT-based scheme. For compari¬ 
son, adopting a 2048 3 grid we expect the aliasing to be negligible 
for wave numbers up to ~ 0.6/iMpc -1 . Due to its small ampli¬ 
tude, at small k the hexadecapole is affected by numerical noise, 
which results in a general instability of the ratio between different 
methods. 



0.01 0.1 0.2 0.3 

k [hMpc" 1 ] 


Figure 1. Top panel: power spectrum multipoles: monopole (blue lines), 
quadrupole (red lines) and hexadecapole (green lines), obtained from the 
average of 50 realization of PTHALOS mocks corresponding to the BOSS 
DR11 CMASS NGC survey geometry. The solid lines display the compu¬ 
tation of Eq. 0 using the FFT-based method using 1024 3 grid cells. The 
dashed and dotted lines display the computation of the Yamamoto estimator 
using the sum-grid (with 512 3 cells) and sum-gal methods, respectively. In 
both these cases an orthonormal base of 512 3 fc-vectors has been used. The 
bottom panels show the corresponding sum-gal and sum-grid multipoles 
divided by the FFT-based multipoles to highlight differences among these 
implementations. 


In Table [T] we show a comparison between the computa¬ 
tion times of the different algorithms of Fig. [T] for the monopole, 
quadrupole and hexadecapole of one realization of the DR 11 
CMASS NGC mocks. For the FFT-based implementation, we also 
show the computation times for different number of cells used. If 
we relax our assumption of 10 times randoms, and use X rarl times 
as many randoms as galaxies (for example, |Anderson et al.|2014| 
used .Y ran = 50), then the computational time taken for sum-gal 
scales by approximately (A' ran + 1)/11. For multiple measure¬ 
ments for different catalogues that use the same randoms, then the 
time in the table reduces by a factor 1/11 for each catalogue where 
the randoms do not have to be reused. However, note that in the 
post-reconstruction analyses of |Anderson et al.| ( [2014| l, the randoms 
are uniquely matched to each galaxy catalogue and so have to be 
calculated for each mock. The sum-grid method does not scale with 
the number of randoms, and is therefore faster than sum-gal when 
the number of randoms to be analysed is larger. Finally, when com¬ 
paring run times, note that for sum-gal there is no aliasing as the 
galaxies and randoms are not placed on a grid, so we can use the 
same Nk to push to smaller k than the grid-based routines. Even 
allowing for these scalings, it is clear that the FFT-based method is 
significantly faster (approximately 1000 times) than either sum-gal 
or sum-grid for reasonable assumptions of grid size and number of 
randoms. 


5 GENERAL MOMENTS OF THE POWER SPECTRUM 

The trick of splitting /x n into Cartesian components employed in 
Eq- 0 will not work directly on moments of more general func¬ 
tions of /r. However we can still use a FFT-based method by de¬ 
composing the functions into Legendre polynomials and summing 
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FFT-based ( 512 3 ) FFT-based (1024 3 ) FFT-based (2048 3 ) sum-gal (512 3 ) sum-grid (512 3 ) 

Time (min) 1.2 7.5 72.5 ~ 1800 ~ 2400 

Table 1. Computation times (in minutes, using 16 processors) for the power spectrum monopole, quadrupole and hexadecapole for the three different imple¬ 
mentations of the Yamamoto algorithm. For the FFT-based implementation we show the number of grid cells used: 512 3 , 1024 3 and 2048 3 . For the sum-gal 
algorithm the computation times are assuming a -1 ~ 10 and for both sum-gal and sum-grid algorithms only computing for k ^ 0.3/iMpc —1 . 



Figure 2. Top panel: power spectrum ‘Wedges’: perpendicular-to-the-LOS 
power spectrum monopole, P± (blue lines) and parallel-to-the-LOS power 
spectrum monopole (red lines) obtained from the average of 50 realiza¬ 
tion of PTHALOS mocks corresponding to the BOSS DR 11 CMASS NGC 
survey geometry. The solid lines display the approximation presented by 
Eq. i |16[17) using the monopole, quadrupole and hexadecapole computed 
by using the FFT-based method described in <j3] placing the particles in 
1024 3 grid cells. The dashed lines display the computation of the “Wedges” 
using sum-gal and Eq. l |14|15} , so the sum is exact. In this case an orthonor¬ 
mal base of 512 3 fc-vector have been used. The bottom panels show the 
fractional differences between the sum-gal and the FFT-based method, for 
P± and P|| as labeled. 


over the multipole-moments. For example, one proposed alterna- 
multipoles is to use “Wedges” jKazin, Sanchez &| 
), where we replace i n Eq. Q by top hat func¬ 
tions in fi covering 0 ^ /x ^ 0.5, whose moment we denote P± 
and 0.5 < n ^ 1 whose moment we denote P \\: 

P±(k) = f/ 2 "S/° 5 ^[ A o(kMS(k)]-Po noise , (14) 

P\\(k) = 7 r~ f 1 dll [A 0 (k)A* 0 (k)]-Pr se , (15) 

7 Jo ^ J 0.5 

where 95 is the azimuthal angle. Then, as discussed in |Kazin,| 
[Sanchez & Blanton (2012), we can approximate these functions 
using the first three even Legendre polynomials as, 

P ± (k) ~ p 0 (fc)-l|p 2 (fc) + i|p 4 (fc), (16) 

P\\ik) ~ Po(k) + §P 2 (fc) - -^P 4 (*). (17) 

In Fig. [2] we show the comparison between the P± (blue lines) 
and P|| (red lines) computed using the sum-gal algorithm (dashed 
lines), i.e. the definition of Eq. (| 14| 1 5|), and the combination of 
Eq. ( |16|17[ > computed using the FFT-based algorithm (solid lines). 
The agreement between the definition of P± and Pm and the ap¬ 


proximation of Eq. (T6p7| i is very good for the range of scales 
studied. This suggest that the Yamamoto implementation based on 
FFTs presented in this Letter is also suitable to be used to compute 
the wedges power spectral moments. 


6 CONCLUSIONS 

We have explored methods for implementing the calculation of 
LOS-dependent moments of the galaxy power spectrum. Follow¬ 
ing on from developments in Yamamoto et al. (2006) and Blake 
|et al.| pOl lfr we have shown that the resulting equation can be 
solved using multiple FFTs, thus providing a fast method to mea¬ 
sure LOS-dependent clustering. We have shown that this is faster 
than previous methods using sums over galaxies, and this will also 
be faster than pair-counting algorithms based on the |Landy & Sza-| 
|lay| ( |1993] l algorithms to calculate configuration-space monopole, 
quadrupole and hexadecapole moments of the correlation function. 
Developments such as this are necessary given the next generation 
of galaxy redshift surveys, including DESI ([Levi et al. |2013 |> and 
Euclid (Laureijs et al. 2011), will provide an order of magnitude 
more galaxies than current surveys, and therefore make computa¬ 
tions more challenging. Developments such as that presented here 
should also find application in the measurement of the bispectrum, 
and contribute to our ability to fully exploit galaxy surveys to pro¬ 
vide cosmological information. 

After submission of our Letter and publication on the archive, 
a similar derivation appeared (Scoccimarro 2015 ). This addition¬ 
ally showed that the hexadecapole can be calculated from the FFTs 
used to estimate the quadrupole, using relationships of Legendre 
polynomials and a slightly different LOS approximation. 
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